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Although rodents are important reservoirs for RNA viruses, to date only one species of rodent 
coronavirus (CoV) has been identified. Herein, we describe a new CoV, denoted Lucheng Rn rat 
coronavirus (LRNV), and novel variants of two Betacoronavirus species termed Longquan Aa mouse 
coronavirus (LAMV) and Longquan R1 rat coronavirus (LRLV), that were identified in a survey of 1465 
rodents sampled in China during 2011-2013. Phylogenetic analysis revealed that LAMV and LRLV fell into 
lineage A of the genus Betacoronavirus, which included CoVs discovered in humans and domestic and 
wild animals. In contrast, LRNV harbored by Rattus norvegicus formed a distinct lineage within the genus 
Alphacoronavirus in the 3CL pro , RdRp, and Hel gene trees, but formed a more divergent lineage in the N 
and S gene trees, indicative of a recombinant origin. Additional recombination events were identified in 
LRLV. Together, these data suggest that rodents may carry additional unrecognized CoVs. 

© 2014 Elsevier Inc. All rights reserved. 


Introduction 

Coronaviruses (CoVs; family Coronaviridae ) are the etiological 
agent(s) of respiratory, enteric, hepatic, and neurological diseases 
in animals and humans. The first coronavirus (infectious bronchitis 
virus) was isolated in chicken embryos in 1937 (Beaudette and 
Hudson, 1937), with subsequent viral isolations in rodents, domes¬ 
tic animals, and humans. However, until the emergence of severe 
acute respiratory syndrome (SARS) in China in 2002/3 (Drosten et 
al., 2003; Woo et al„ 2009), coronaviruses had been of greater 
concern to agriculture than public health. Since the discovery of 
SARS-CoV intense scientific efforts have been directed toward 
characterizing additional coronaviruses in humans and other 
animals (Drexler et al., 2010; Guan et al., 2003; Lau et al., 2005; 
Li et al., 2005; Quan et al., 2010; van der Hoek et al„ 2004; Woo 
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et al., 2012). As a consequence, the number of coronaviruses 
identified has increased rapidly (Woo et al., 2009, 2012). Of 
particular importance was the recent discovery of a new severe 
respiratory illness with renal failure (Middle East Respiratory 
Syndrome, MERS) caused by a novel coronavirus (MERS-CoV) 
(Bermingham et al., 2012; van Boheemen et al„ 2012), and which 
is also a zoonosis (Annan et al„ 2013; Azhar et al., 2014; Reusken 
et al., 2013). It is highly likely that there are additional unrecog¬ 
nized coronaviruses circulating in animals. 

Rodentia (rodents) is the largest order of mammals with 
approximately 2277 species worldwide, representing some 42% 
of all mammalian species (Wilson and Reeder, 2005). Rodents are 
a major zoonotic source of human infectious diseases (Meerburg 
et al., 2009; Luis et al., 2013), particularly as they often live at high 
densities and hence may harbor high levels of microbial diversity 
(Moya et al., 2004). In addition, some rodent species live in close 
proximity to humans, such that they represent an important 
zoonotic risk. To date, however, only one species of coronavirus 
- Murine coronavirus - has been associated with rodents 
(de Groot et al., 2011). The prototype virus, which was named 
mouse hepatitis virus (MHV), was first isolated in mice in 1949 
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(Cheever et al., 1949), with a variant then identified in rats in 1970 
(where it was termed rat sialodacryoadenitis coronavirus) (Parker 
et al., 1970). No other rodent-associated CoVs have been discov¬ 
ered since this time. 

Although RNA viruses are often characterized by their high 
rates of mutation, recombination may also be of evolutionary 
importance, and has been associated with such characteristics as 
the ability to infect new hosts and alter virulence (Holmes, 2013). 
Recombination appears to be commonplace in coronaviruses 
(Graham and Baric, 2010; Jackwood et al., 2012; Keck et al., 
1987; Woo et al., 2006), and which may facilitate their emergence. 
For example, two types of feline CoVs (FCoV) - FCoV type I and II - 
have arisen by double recombination events between FCoV types I 
and canine Coronavirus (CCoV) (Herrewegh et al., 1998). Similarly, 
recombination generated the three genotypes (A, B and, C) of 


Table 1 

Prevalence of coronaviruses in rodents in Zhejiang Province, China. 


Species 

Longquan 

Residential 

field 

Wencheng 

Residential 

field 

Lucheng 

Residential 

Total 

Apodemus 

- 

10/427 

- 

0/17 

- 

10/444 

agrarius 

Mus musculus 

0/3 

- 

0/4 

- 

- 

0/7 

Microtus fortis 

0/44 

0/261 

- 

- 

- 

0/305 

Micromys 

- 

0/2 

- 

- 

- 

0/2 

minutus 

Niviventer 

- 

1/58 

- 

0/27 

- 

1/85 

confucianus 

Rattus 

3/214 

- 

0/31 

- 

1/17 

4/262 

norvegicus 

R. lossea 

- 

14/300 

- 

0/1 

- 

14/301 

R. tanezumi 

0/25 

1/7 

0/18 

- 

0/3 

1/53 

R. fulvescens 

on 

0/3 

- 

- 

- 

0/4 

R. edwardsi 

- 

- 

- 

0/2 

- 

0/2 

Total 

3/287 

26/1058 

0/53 

0/47 

1/20 

30/1465 


Note: CoV RNA positive specimens/total specimens; no animals were captured. 


human coronavirus HKUl (Woo et al., 2006), and homologous 
recombination has occurred in the evolutionary history of SARS- 
CoV (Graham and Baric, 2010). 

To explore the diversity and evolution of CoVs in rodent 
populations we screened rodents collected from rural regions of 
Zhejiang province, China. This revealed a remarkable diversity of 
CoVs circulating in rodents, along with evidence for cross-species 
transmission and recombination. 


Results 

Collection of rodents, and the identification of coronaviruses 

A total of 1465 rodents representing 10 different species were 
captured from three locations in Zhejiang province, China during 
2011-2013 (Table 1 and Fig. 1). RT-PCR targeting a conserved 
sequence of the viral RdRp (RNA-dependent RNA polymerase) 
gene was performed to detect coronaviruses. PCR products of the 
expected size were recovered from 10 Apodemus agrarius, 4 Rattus 
norvegicus, 14 R. lossea, 1 R. tanezumi, and 1 Niviventer confucianus, 
such that approximately 2% of rodents were positive for CoV 
(Table 1). The classification of these viruses as CoVs (Family 
Coronaviridae, Genus Alpha- and Beta-) was confirmed by genetic 
analyses (see below). 

Genetic characterization of viral sequences 

To better characterize the rodent CoVs discovered here, complete 
viral RdRp gene sequences were recovered from 21 (70%) of the RNA 
positive rodent samples described above. Additionally, 1 complete 
and 4 near complete ( > 98%) viral genome sequences were success¬ 
fully recovered from five positive CoV samples (Table SI). Genetic 
analysis indicated that two CoVs sampled from R. norvegicus in 
Lucheng and Longquan shared 50.6%-71.7% nucleotide sequence 
similarity with alpha coronaviruses; 15 CoVs from 13 R. lossea, 1 R. 



Fig. 1. A map of Zhejiang province, China showing the location of trap sites in which rodents were captured and surveyed for coronaviruses. 
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tanezumi, and 1 N. confucianus from Longquan had 78.4%-89.5% 
nucleotide sequence similarity with murine coronavirus and 76.4%- 
85.7% nucleotide sequence similarity with human coronavirus HKU1; 
and 13 CoVs from 10 A. agrarius, 2 R. norvegicus and 1 R. lossea from 
Longquan had 60.3%-85.9% nucleotide sequence similarity with 
rabbit coronavirus HKU14, isolated from domestic rabbits in Guangz¬ 
hou, China (Lau et al„ 2012) (see phylogenetic results below). Overall, 
we designated these newly described viruses as Longquan Aa mouse 
coronavirus (LAMV), Longquan R1 rat coronavirus (LRLV), and 
Lucheng Rn rat coronavirus (LRNV), reflecting their host species 
and the geographic location of sampling. 

Further comparison of the CoV replicase domains [i.e. ADP- 
ribose 1 "-phosphatase (ADRP), chymotrypsin-like protease 
(3CL pro ), RdRp, helicase (Hel), 3'-to-5' exonuclease (ExoN), nido- 
viral endoribonuclease specific for uridylate (NendoU) and ribose- 
2'-0-methyltransferase (O-MT)] revealed that LRNV Lucheng-19 
was < 90% similar in amino acid sequence to known members of 
the genus Alphacoronavirus (Table S2). Hence, these data suggest 
that LRNV is sufficiently divergent that it represents a novel 
species of coronaviruses according to the criteria for species 
demarcation in the subfamily Coronavirinae defined by the Inter¬ 
national Committee on Taxomony of Viruses (ICTV) (de Groot 
et al„ 2011). With respect to the other two viruses, LRLV Long- 
quan-189 was < 90% similar to known members of the genus 
Betacoronavirus in the ADRP and NendoU regions, suggesting that 


it represents a new variant of murine coronavirus (see phyloge¬ 
netic analysis). Although LAMV Longquan-343 was nearly 90% 
similar in the conserved replicase domains to betacoronavirus 1 
(i.e. Human coronavirus OC43, Bovine coronavirus, Porcine 
hemagglutinating encephalomyelitis virus, and Rabbit coronavirus 
HKU14) and < 90% to other members of the genus Betacoronavirus 
(Table S2), it does not exhibit sufficient sequence divergence to 
represent a new virus species. Hence, we classify it as a new 
variant of Betacoronavirus 1. 

The viral genome sequences obtained in this study were 
compared with Rhinolophus bat coronavirus HKU2 virus, a mem¬ 
ber of the genus Alphacoronavirus (Lau et al„ 2007), and MHV 
(Cheever et al„ 1949), human coronavirus HKUl (Woo et al„ 2006), 
rabbit coronavirus HKU14 (Lau et al„ 2012), which are all members 
of the genus Betacoronavirus, (Fig. 2). A previous description of the 
genome organization of CoVs (de Groot et al., 2011) was used as a 
reference. LRNV (Lucheng-19) had a genome of 28,763 nucleotides, 
with a G+C content of 40.2%. Its genome organization was similar 
to those of members of the genus Alphacoronavirus, with the 
characteristic 5'-replicase ORFlab-S-envelope(E)- membrane(M)- 
N-3' gene order (Fig. 2, Table 2, S3). The replicase ORFlab (20,249 
nucleotides in length) includes 16 predicted nonstructural proteins 
(nsp) (Table S3). In a manner similar to Rhinolophus bat corona¬ 
virus HKU2 and human coronavirus NL63, LRNV possessed the 
core part of the putative transcription regulatory sequence (TRS) 
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Fig. 2. Genome organization of coronaviruses. The three CoVs discovered in this study are shown in bold. The star signifies the presence of a betacoronavirus-like NS2a gene 
in LRNV. 
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Table 2 

Coding of potential and putative transcription regulatory sequences of the LRNV (Lucheng-19) genome sequence. 


ORF 

Location (nt) 

Length (nt) 

Length (aa) 

TRS location 

No. of matching base pairs 
compared to leader 

TRS (body/leader) 

TRS sequence (s) 

(distance in bases to AUG) 

lab 

332-20,580 (shift at 12,538) 

20,249 

6749 

65 


CAACUCAACUAAACGA(251 )AUG 

NS2 

20,577-21,404 

828 

275 

20,565 

11/12 

CAACUUAACUAAAUG 

S 

21,412-24,816 

3405 

1134 

21,398 

11/14 

U GACUAAACUAAACAU G 

NS4 

24813-25457 

645 

214 




E 

25,457-25,693 

237 

78 

25,446 

10/12 

CCACUUAACUAAUG 

M 

25,703-26,449 

747 

248 

25,690 

9/13 

UUGAUCAACUAAAAUG 

NS7a 

26,461-26,958 

498 

165 

26,443 

11/16 

GGUCUAAACUAAACCA(2)AUG 

NS7b 

26,555-26,926 

372 

123 

26,511 

9/16 

CAUUAAAACUAAUUGU(28)AUG 

N 

26,974-28,149 

1176 

391 

26,960 

8/14 

AGUUUCAACUAACAAUG 

NS8a 

27,607-28,149 

543 

180 

27,555 

9/16 

UGAUAGAACUAAAG AA(3 6 )AUG 

NS9 

28,151-28,465 

315 

104 

28,134 

9/16 

UGAUGAAACUAAUUGA( 1 )AUG 


Numbers in parentheses represent the number of nucleotides to the putative start codon. Start codons are underlined. The conserved TRS core sequence, AACUAA, is 
highlighted in bold. 


Table 3 

Characteristics of the spike protein in LRLV and LAMV. 


Spike protein 

Strain 

Signal peptide 

Receptor binding domain 

Cleavage site 

Heptad repeat 

Transmembrane domain 

LRLV 

Longquan-708 

1-14 

325-533; 586-669 

754-755 

1009-1122; 1257-1288 

1307-1329 


Longquan-370 

1-13 

319-525 

762-763 

996-1109; 1244-1275 

1294-1316 


Longquan-189 

1-13 

319-525 

762-763 

996-1109; 1244-1275 

1294-1316 

LAMV 

Longquan-343 

1-15 

327-497 

776-777 

1004-1117; 1252-1290 

1302-1324 


5'-AACUAA-3' upstream of the 5' end of each ORF with the 
exception NS4, with variable nucleotides matching the leader core 
sequences (Table 2). Additionally, LRNV contained NS7a and NS7b 
genes between the M and N genes (Fig. 2), and which are observed 
in no other members of the genus Alphacoronavirus. Perhaps the 
most striking feature of the LRNV genome was that NS2 encodes a 
putative nonstructural protein of 275 amino acids located between 
the replicase ORFlab and the S gene (Fig. 2, Table 2). A BLAST 
search revealed that this NS2 had no amino acid sequence 
similarity with alpha-CoVs, but possessed approximately 42% 
amino acid identity with the NS2a of lineage A of beta-CoVs. 
Hence, this is suggestive of a homologous recombination event 
between alpha-CoVs and beta-CoVs (and which was confirmed in 
the phylogenetic analysis below). 

In contrast, the genome organization of LRLV and LAMV were 
similar to those of members of the lineage A CoVs of the genus 
Betacoronavirus, with the characteristic 5'-ORFlab-hemagglutinin- 
esterase (HE)-S-E-M-N-3' gene order (Fig. 2). In both LAMV and 
LRLV the NS5a and NS5b nonstructural proteins were located 
between the S and E genes. Interestingly, however, NS5a was not 
observed in the strain Longquan-370 (LRLV). The TRS of LAMV and 
LRLV appeared in two forms - the CUAAAC and CCAAAC type. The 
S protein of LRLV were 1350-1366 amino acids in length, with 
65%-89% amino acid identity to the S proteins of other lineage A 
CoVs of the genus Betacoronavirus, while the S protein of LAMV 
were 1358 amino acids in length with 62%-68% amino acid 
identity to other lineage A CoVs of the genus Betacoronavirus. 
Finally, the S proteins of both LRLV and LAMV contain a potential 
signal peptide, receptor binding domain, a potential SI and S2 
cleavage site, two heptad repeats and one transmembrane domain 
(Table 3). 

Phylogenetic relationships among the CoVs 

To determine the evolutionary relationships among the novel 
CoVs discovered here and those found previously, we inferred 
phylogenetic trees based on the amino acid sequences of the 
3CL pro , RdRp, Hel, S and N proteins (Figs. 3 and 4). Consistent with 


previous work (de Groot et al., 2011), all CoVs fell into two well 
supported groups, corresponding to the Alphacoronavirus and 
Betacoronavirus genera respectively. With respect to the viruses 
identified here, LRNV clustered within the genus Alphacoronavirus, 
while LAMV and LRLV fell into the genus Betacoronavirus. 

One of the most striking observations from the phylogenetic 
analysis was that the sequences from LRNV were located at five 
different positions in the phylogenetic trees, strongly suggestive of 
recombination (Figs. 3 and 4). Specifically, in the 3CL pro and RdRp 
gene trees, LRNV clustered as a member of the genus Alphacor¬ 
onavirus. LRNV also fell with alphacoronaviruses in the Hel genes 
tree, but as a basal lineage. Strikingly, however, LRNV clustered 
with the betacoronaviruses in the N gene tree, and formed a 
divergent lineage in S gene tree with Rhinolophus bat coronavirus 
HKU2, which has previously been shown to be a recombinant (Lau 
et al., 2007). 

In contrast, LAMV and LRLV consistently clustered within the 
lineage A of the genus Betacoronavirus, which also contained 
human CoV HKU1, MHV, rabbit coronavirus HKU14, and human 
CoV OC43 viruses. Notably, however, in the 3CL pro , RdRp, and N 
gene trees (Fig. 3), two strains (Longquan-189 and Longquan-370) 
formed a monophyletic group with human coronavirus HKU1 
virus, which was isolated from a patient with pneumonia in Hong 
Kong in 2005 (Woo et al., 2005). Interestingly, Longquan-708 was 
closely related to Longquan-370 and Longquan-189 in the 3CL pro , 
RdRp, and Hel genes, yet clustered with MHV and rat coronavirus 
in the S gene tree (Fig. 4), and represented a distinct lineage in the 
N gene tree (Fig. 3). Such variable grouping suggests that 
Longquan-708 may be a recombinant between two rodent CoV 
lineages. Finally, in the 3CL pro , RdRp, and S gene trees, LAMV 
(Longquan-343) occupied the most divergent phylogenetic posi¬ 
tion in a group of viruses that contained human CoV OC43 (St-Jean 
et al., 2004) as well as viruses sampled from domestic and wild 
animals (Hasoksuz et al., 2007; Lau et al., 2012; Lim et al., 2013; 
Vijgen et al., 2006). However, LAMV comprised a distinct lineage in 
the Hel and N gene trees. 

To investigate these putative recombination events in more detail, 
we undertook additional sequence analyses (Fig. 5 and Fig. SI). 
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Fig. 3. Phylogenetic analyses of the amino acid sequences of the 3CL pro , RdRp, Hel, and N genes of Lucheng Rn rat coronavirus (LRNV) Lucheng-19, Longquan Aa mouse 
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accession numbers of the viruses used in this analysis are shown in Table SI. 


Multiple methods within the RDP program supported statistically 
significant recombination events in LRLV strain Longquan-708 
(p< 1.022 ~ 146 to p < 5.908 ~ 17 ). Similarity plots suggested the pre¬ 
sence of three recombination breakpoints at nucleotide positions 


19,294, 19,979, and 22,112, which separated the genome into four 
regions (Fig. 5A). In turn, these could be grouped into two putative 
‘parental regions’; region A (nt 1 to 19,915 and 20,600 to 22,784) and 
region B (19,915 to 20,600 and 22,784 to the end of the sequence). 
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Fig. 4. Phylogenetic analyses of the amino acid sequences of the S genes of LRNV, LAMV and LRLV. Numbers ( > 70) above or below branches indicate percentage bootstrap 
values. The trees were mid-point rooted for clarity only. The scale bar represents the number of amino acid substitutions per site. The GenBank accession numbers of other 
CoVs are given in Table SI. 


In parental region A, Longquan-708 was most closely related to 
Longquan Rl Rat coronaviruses, while in parental region B it was 
more closely related to MHV. This recombination event was confirmed 
by phylogenetic analyses, in which the alternative grouping of 
Longquan-708 was supported with high bootstrap values (Fig. 5B 
and C). 

In contrast, although readily apparent in the amino acid phytoge¬ 
nies, the recombination event involving LRNV did not receive sig¬ 
nificant statistical support in the RDP analysis, likely because the latter 
utilizes nucleotide sequences and these are highly divergent (for 
example, the S protein of LRNV differs from those of alphacorona- 
viruses by > 78% at the nucleotide sequence analysis). Similar 
suggestions have previously been made with respect to recombination 
in Rhinolophus bat coronavirus HKU2 (Lau et al., 2007). 


Discussion 

We screened for coronaviruses in 1465 rodents representing 10 
different species sampled in three locations in Zhejiang province, 
southeastern China. This survey identified a novel and phyl- 
ogenetically distinct coronavirus in R. norvegicus - Lucheng Rn rat 


coronavirus (LRNV) - which belonged to the genus Alphacoronavirus. 
According to the criteria defined by ICTV (de Groot et al., 2011), LRNV 
was sufficiently genetically distinct that it should be recognized as a 
distinct species within the family Coronciviridcie. However, the other 
two viruses identified - Longquan Aa mouse coronavirus (LAMV) and 
Longquan Rl rat coronavirus (LRLV) - belong to the established 
species betacoronavirus 1 and murine coronavirus, respectively. 
More generally, the presence of all three viruses indicates that 
genetically diverse CoVs co-circulate in rodents in Zhejiang province. 

It is notable that rodent-associated CoVs comprise a major 
proportion of the known genetic diversity in lineage A CoVs of the 
genus Betacoronavirus. This lineage contains viruses that cause 
enteric and respiratory diseases in humans (human coronavirus 
HKU1 and OC43) as well as in domestic animals (e.g. hemagglu¬ 
tinating encephalomyelitis in pigs) (St-Jean et al., 2004; Vijgen 
et al., 2006; Woo et al., 2005; Zhang et al„ 1994). Clearly, the role 
by rodents in the evolution of lineage A CoVs of the genus 
Betacoronavirus merits further investigation. 

This study also provides the first evidence of CoVs of the genus 
Alphacoronavirus in Rattus rats (R. norvegicus ), in the form of LRNV. 
However, our phylogenetic analysis suggested that this virus had 
a recombinant origin, with its N gene sequence more closely 
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related to those of the genus Betacoronavirus (Fig. 4). Recombination 
appears to be commonplace in coronaviruses (Woo et al., 2009), 
particularly within closely related viruses such as MHV variants 
(Smits et al., 2005). Nevertheless, only a few examples of inter¬ 
genotype recombination, involving CoVs from bats (Hon et al„ 2008) 
and felines (Herrewegh et al., 1998), have been documented to date. 
Hence, the observation that LRNV has a recombinant origin is 
significant because it means that recombination can occur between 
viruses assigned to different genera. 

Alpha- and beta-CoVs are largely associated with mammals, 
whereas gamma- and delta-CoVs are largely harbored by avian 
species (Woo et al., 2012). Because much of the genetic diversity of 
alpha- and beta-CoVs is associated with infections in bats, it has 
been suggested that bats are the main reservoir hosts for both 
alpha- and beta- CoVs (Woo et al., 2009). Herein, we discovered 
three phylogenetically distinct lineages of rodent-associated CoVs 
within a limited geographic area in China, ail of which are distinct 
from those viruses associated in bats. Consequently, it is clear that 
large-scale surveillance is needed to fully understand the role 
played by rodents in the evolution and emergence of coronaviruses. 


Material and methods 

Ethics statement 

This study was reviewed and approved by the ethics committee 
of the National Institute for Communicable Disease Control and 
Prevention of the Chinese CDC. All animals were treated strictly 


according to the guidelines for the Laboratory Animal Use and Care 
from the Chinese CDC and the Rules for the Implementation of 
Laboratory Animal Medicine (1998) from the Ministry of Health, 
China, under the protocols approved by the National Institute for 
Communicable Disease Control and Prevention. All surgery was 
performed under ether anesthesia, and all efforts were made to 
minimize suffering. 

Specimen collection 

Rodents were trapped in cages using cooked food as bait during 
2011-2013 in Zhejiang province, China (Fig. 1) (Mills et al., 1995). 
All animals were initially classified to a specific rodent species by 
morphological examination, and were further confirmed by 
sequence analysis of the mt -cyt b gene (Guo et al., 2013). All 
animals were anesthetized with ether before they were sacrificed, 
and every effort was made to minimize suffering. Tissue samples 
of liver, spleen, lung, kidney, and rectum were collected from 
animals for the detection of CoVs. 

CoV detection and full genome sequencing 

Total RNA was extracted from fecal or tissue samples using 
TRIzol (Invitrogen, Carlsbad, CA) according to the manufacturer's 
instructions. The RNA was eluted with RNase-free water and was 
used as the template for reverse transcription-PCR (RT-PCR) and 
deep sequencing. CoV RNA was detected by nested RT-PCR which 
amplified the RNA-dependent RNA polymerase gene (RdRp) of 
CoVs using conserved primers (sequence available on request). 
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Reverse transcription was undertaken using AMV reverse tran¬ 
scriptase (Promega, Beijing) according to the manufacturer's pro¬ 
tocol. The cDNA was amplified with the following PCR protocol: 
35 cycles of demodulation at 94 C for 40 s, annealing at 44 °C for 
40 s and extending at 72 °C for 40 s, with ddH 2 0 as a negative 
control. For CoV positive RNA extractions, pair-end (90 bp) 
sequencing was performed on the HiSeq 2000 (Illumina) platform. 
The library preparation and sequencing steps were performed by 
the BGI Tech Corporation (Shenzhen, China) following a standard 
protocol provided by Illumina. The resulting sequencing reads 
were then assembled de novo by the Trinity program (Grabherr 
et al., 2011) into 152,684 contigs (>200 bp). BLASTx was per¬ 
formed to retrieve the CoV full genome sequences from the 
assembled contigs. These sequences were further verified using 
Sanger sequencing methods with primers designed based on the 
deep-sequencing results. To amplify the terminal ends, 3' and 5' 
RACE kits (TaKaRa, Dalian, China) were used. 

Nucleotide sequence accession numbers 

The sequences generated in this study have been deposited in 
GenBank and assigned accession numbers KF294379-KF294380, 
KF294358-KF294372, KF294345-KF294357 for those representing 
coronaviruses, and KF294387-KF294416 for the host mt-cyf b 
genes (Table SI). 

Evolutionary analyses 

Because of extensive sequence divergence between the nucleo¬ 
tide sequences of different coronavirus genera, all phylogenetic 
analyses were based on amino acid sequences. Accordingly, amino 
acid sequence alignments were performed using the MAFFT 
algorithm (Katoh and Standley, 2013). After alignment, gaps and 
ambiguously aligned regions were removed with Gblocks (v0.91b) 
(Talavera and Castresana, 2007). Phylogenetic analyses were then 
performed using the sequences of five CoV proteins: (i) 3CL pro , (ii) 
RdRp, (iii) Hel, (iv) spike protein (S), and (v) the nucleocapsid 
protein (N). Phylogenetic trees were estimated using the max¬ 
imum likelihood (ML) method implemented in PhyML v3.0 
(Guindon et al., 2010) with bootstrap support values calculated 
from 1000 replicate trees. The best-fit amino acid substitution 
models (LG+r for 3CL pro , LG+r+I for Hel, RdRp, S and N) were 
determined using MEGA version 5 (Tamura et al., 2011). The 
following data set sizes were used in the final analysis: 
3CL pro =290 amino acids (aa), RdRp=869 aa, Hel = 581 aa, 
S=429 aa, N=138 aa. 

The TMHMM program (version 2.0; www.cbs.dtu.dk/services/ 
TMHMM/) was used to predict the transmembrane domains, while 
the Signal P program (version 4.0; http://www.cbs.dtu.dk/services/ 
SignalP/) was to determine signal sequences. Protein family 
analysis was performed using PFAM and InterProScan (Apweiler 
et al., 2001; Bateman et al., 2002). 

Following visual inspection of the amino acid phylogenies, 
potential recombination events were identified in complete gen¬ 
ome (nucleotide) sequences using the Recombination Detection 
Program v4 (RDP4), employing the RDP, GENECONV, bootscan, 
maximum chi square, Chimera, SISCAN, and 3SEQ methods 
(Martin et al., 2010) (with default parameters). All analyses were 
performed with a Bonferroni corrected P-value cutoff of 0.01. 
When putative recombination events were observed by two or 
more methods and with significant phylogenetic (topological) 
incongruence, the viral sequences were considered as potentially 
recombinant. To further characterize these recombination events, 
particularly the location of breakpoints, we inferred similarity 
plots using Simplot version 3.5.1 (Lole et al., 1999). For each of the 
putative recombinant regions, phylogenies were estimated using 


the ML method performed with PhyML v3.0 (Guindon et al., 2010) 
under the best-fit substitution model determined by jModelTest 

(Posada, 2008). 
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